A novel optimization rainfall coupling model based on stepwise decomposition technique

Traditional decomposition integration models decompose the original sequence into subsequences, which are then proportionally divided into training and testing periods for modeling. Decomposition may cause data aliasing, then the decomposed training period may contain part of the test period data. A more effective method of sample construction is sought in order to accurately validate the model prediction accuracy. Semi-stepwise decomposition (SSD), full stepwise decomposition (FSD), single model semi-stepwise decomposition (SMSSD), and single model full stepwise decomposition (SMFSD) techniques were used to create the samples. This study integrates Variational Mode Decomposition (VMD), African Vulture Optimization Algorithm (AVOA), and Least Squares Support Vector Machine (LSSVM) to construct a coupled rainfall prediction model. The influence of different VMD parameters α is examined, and the most suitable stepwise decomposition machine learning coupled model algorithm for various stations in the North China Plain is selected. The results reveal that SMFSD is relatively the most suitable tool for monthly precipitation forecasting in the North China Plain. Among the predictions for the five stations, the best overall performance is observed at Huairou Station (RMSE of 18.37 mm, NSE of 0.86, MRE of 107.2%) and Jingxian Station (RMSE of 24.74 mm, NSE of 0.86, MRE of 51.71%), while Hekou Station exhibits the poorest performance (RMSE of 25.11 mm, NSE of 0.75, MRE of 173.75%).


African vulture optimisation algorithm (AVOA)
The AVOA algorithm starts by assuming that there are N vultures in the search space, and it first employs a grouping strategy to enhance population diversity, i.e., the two vultures with the first (optimal) and second (suboptimal) fitness values are grouped together, and the remaining (N-2) vultures begin to search for food around the top two 24 .The following is the iterative process of the AVOA algorithm: Stage 1: Randomly initialise the population, and then select the best or second best individual for the next stage of optimisation according to the "roulette" rule.For the ith vulture in the population, its learning object is selected according to Eq. (1): where L is a user-defined parameter located between (0, 1), which contributes to the increase of population diversity when L tends to 0; conversely, it accelerates population aggregation; and rand is a [0, 1] uniformly distributed random number.
Phase 2: Define the starvation rate to enable the transition between the algorithm development and exploration process.The starvation rate F increases as the iterative process advances to more likely facilitate the development process.
where, h , z are uniformly distributed random numbers in [− 2, 2] and [− 1, 1], respectively; w is a user-defined parameter that controls the probability that the algorithm enters the exploration mode in the final stage; and iter is the current number of iterations as the algorithm proceeds.
Phase 3: Spatial exploration.The AVOA uses a user-defined parameter P 1 to determine which exploration mode to enter, taking values between (0, 1).
where P(i + 1) is the updated position of the vulture; X is a [0, 2] uniformly distributed random number.
Phase 4: Local exploitation.The AVOA initiates the exploitation phase when the absolute value of the starvation rate |F| < 1 .Unlike the exploration phase this phase contains two types of subphases, and the initiation of the two subphases is demarcated by |F| = 0.5|.
Subphase 1 judgement condition: |F| ≥ 0.5 .In this phase, the position updating method mimics the charac- teristics of the vulture's spiral flight and is executed according to Eqs. ( 6) to ( 9): where the parameter P 2 takes a value between (0, 1).

Variational mode decomposition (VMD)
VMD is a commonly used adaptive and fully recursive signal sequence processing method 25 , which firstly requires the number of decompositions, K, and the quadratic penalty factor, α, and then iteratively searches for the optimal centre frequency and finite bandwidth corresponding to the optimal solution of the model, which is able to adaptively match the respective intrinsic mode function (IMF) and achieve effective separation of the IMF.IMF, then iteratively search for the optimal centre frequency and finite bandwidth corresponding to the optimal solution of the model that can adaptively match each IMF and achieve effective separation of IMFs 26 .
where A K (t) is the instantaneous amplitude function; ϕ K (t) is the non-decreasing instantaneous phase function.

Least squares support vector machine (LSSVM)
LSSVM is an improved algorithm based on SVM 27 .The classical SVM is based on the need to minimise the structural risk minimisation principle by introducing the associated loss function and relaxation variables, and the fitting problem is transformed into solving a quadratic optimisation problem.The improvement made by the LSSVM is that the inequality constraints in this optimisation problem are converted into equality constraints, and the following optimal objective function is constructed: where C is the regularisation parameter and δ i is the ith relaxation variable.By introducing the Lagrange factor α i , the Lagrange function L can be written as: can be obtained by partial differentiation for w , b , δ and α , respectively: where , and I is the unit matrix.The kernel function K(x, x i ) ≤ φ(x) , φ(x i ) > is chosen to reduce the computational effort, then the regression equation of the LSSVM model is finally determined as:

Monthly precipitation prediction model Prediction steps
The construction steps of the monthly precipitation prediction model based on the AVOA and VMD are outlined as follows: Step 1: Utilize the VMD algorithm to decompose the precipitation sequence based on different stepwise decomposition sample construction methods, resulting in K modal components.
Step 2: To precisely describe the preceding influencing factors for each component, determine the corresponding lag months (lag k ) for the kth modal component based on AutoCorrelation Function and Partial AutoCorrelation Function.Taking Huairou Station as an example, six modal components are obtained with lag values of [2, 7, 4, 6, 6, 3] for each component.
Step 3: Generate training and testing samples based on different stepwise decomposition sample construction methods, with a training sample ratio of 0.8 and the remaining as testing samples.Normalize the samples according to the training set.
Step 4: The training samples are fed into the prediction model for training.
Step 5: The accuracy and performance of the model is evaluated through evaluation metrics.The flowchart of the monthly precipitation prediction model is shown in Fig. 1.

Evaluation metrics
Model error evaluation indexes include the root mean square error (RMSE), Nash efficiency coefficient (NSE), and mean relative error (MRE).The lower the error, the higher the prediction accuracy; the consistency index (IA), which ranges from 0 to 1, reflects the generalization ability; the closer it is to 1, the better the model prediction performance; and the U-statistic (U1), which evaluates the prediction ability, the closer it is to 0. The model's predictive power increases with the value's proximity to zero.The following is the calculating formula:

Data source
The North China Plain covers a total area of 300,000 km 2 , belonging to the continental monsoon climate zone, with obvious changes in the four seasons and an average annual precipitation of 500-900 mm.Alluvial plains are characterized by comparatively flat topography, with the majority of elevations being below 50 m 28 .One of China's most significant bases for grain production, the North China Plain is crucial to the country's food security.Making timely and efficient judgments about agricultural productivity and water resource management can be aided by predictive rainfall simulation in this area 29 .
The North China Plain's national meteorological stations provided the precipitation data used in the meteorological analysis (https:// data.cma.cn/).For the estuary stations, precipitation data spanning January 1993 to December 2018 is available.Precipitation data from January 1973 to December 2018 are available for the remaining locations.

Sample construction
The ratio of training and testing samples for the model was taken as 4:1.The training period was from August 1972 to July 2009 and the testing period was from August 2009 to December 2018, except for the estuary station (training period: July 1992 to July 2013; testing period: August 2013 to December 2018).
Constructing correct and effective training and test samples is the focus of accurate prediction of precipitation.Commonly employed sample construction methods conducive to practical application include the "Semi-stepwise Decomposition (SSD)" sample technique 30 , the "Fully Stepwise Decomposition (FSD)" sample technique 31 , the "Single-model Semi-stepwise Decomposition (SMSSD)" sample technique 32 , and the "Single-model Fully Stepwise Decomposition (SMFSD)" sample technique 20 .The first two methods require the simultaneous construction of K (where K represents the number of modal components) models.In contrast, the latter two methods only necessitate a single model to obtain the final prediction result, thereby offering a higher operational speed.
Using VMD and the four sampling strategies, the rainfall data were broken down into subsequences, which were then fed into the LSSVM model to be simulated.Based on rainfall data from five locations in the North China Plain, a comparative analysis was carried out.As seen in Fig. 3, the radial line depicts the correlation coefficient, the horizontal and vertical axes reflect the standard deviation, and the scattered spots in the picture represent various sampling techniques.It is evident that the five stations perform better when using the stepwise decomposition approaches (SMFSD and SMSSD), and the predicted outcomes of these sampling strategies are most similar to the observed values with the least amount of standard deviation.Among the four sample approaches, the majority of correlation coefficients fall between 0.8 and 0.95.

Suitability of parameter α
The penalty factor α, as a crucial parameter for precipitation sequence denoising, plays an equally significant role in forecasting accuracy.Three scenarios with penalty factor α set to 100, 1000, and 2000 were considered.Since only the testing period results can reflect actual predictive capabilities, Friedman tests were conducted for Vol:.( 1234567890 The composite model based on SMFSD technique consistently maintains its top two positions, continuing to uphold its superiority over other decomposition techniques.In contrast, the AVOA-LSSVM model ranks third.Therefore, the utilization of inappropriate decomposition techniques in monthly precipitation forecasting may result in decreased accuracy.The SMFSD technique emerges as a more suitable sample construction method for predicting monthly precipitation in the North China Plain.

Suitability across different stations
The above study focuses on the overall evaluation of the performance of stepwise decomposition techniques in the North China Plain.However, due to the vast geographical range leading to differences in geographical and climatic conditions, the adaptability of each model may vary across different stations.The average predictive success rate of each model is used to determine the optimal applicable model for each station.According to the Hydrological Information Forecast Specification (GB/T 22482-2008), the permissible error is determined based on a 20% range of the measured values during the same period over multiple years.
According to this requirement, the average success rate of each model at each station is calculated for each month.Similarly, Friedman tests are used to rank the models.Table 2 records the optimal models for each station, along with their success rates in each month and the p-values from the Friedman test.Based on a confidence level of 0.05 and the p-values, the model ranking results are only significantly different for Huairou and Zhengding stations.The optimal monthly precipitation prediction model for Zhengding station is SMFSD-AVOA-LSSVM with α = 100, while for Huairou station, it is SMFSD-AVOA-LSSVM with α = 1000.Using an 80% threshold as the preferred criterion to determine the advantageous months for each station in prediction work, the results are as follows: June for Huairou station (87.8%),February (90.0%) and October (86.7%) for Hekou station, July for Jiaozuo station (88.9%),May (88.9%) and August (90.0%)for Zhengding station.At the Jingxian station, the months of May (76.7%) and September (79.0%) are closer to the preferred threshold.In terms of the average success rate, only Zhengding station exceeds 60%.The average success rate at Hekou station is the lowest, only 48.8%, possibly due to insufficient learning caused by limited historical data.Table 2 provides the test results for the optimal models at each station, and Fig. 4 illustrates the training and prediction effects at each station.The best predictive performance in this study is observed at Huairou and Jingxian stations, while Hekou station exhibits the poorest predictive performance.Additionally, the prediction of precipitation sequences at each station performs well in the low-value range, while the predictive ability near extreme values requires further.

Comparison of simulation results
While Fig. 4 shows the comparison between the prediction effect and the actual rainfall during the test period at each site, Table 3 provides the findings of the evaluation indexes of the ideal models at each site.Based on the error indicator results used to assess the model's prediction accuracy, the NSE ranges from 0.73 to 0.86, with Huairou and Jingxian having the best NSE; the RMSE of SMFSD-AVOA-LSSVM is less than 25 mm, with the lowest value being 11.62 mm in Jiaozuo.The IA results show that generalization ability is a key metric for assessing model prediction accuracy; the closer the generalization ability is to 1, the better.All five of the North China Plain stations have IAs better than 0.96, demonstrating the good prediction performance and good generalization capacity of SMFSD-AVOA-LSSVM.U1 is used to measure the prediction abilities of the model; the closer the model is to 0, the better.The three with the finest prediction skill and the least U1 scores are Huairou,

2 nFigure 1 .
Figure 1.Construction steps of the monthly precipitation prediction model based on AVOA algorithm and VMD stepwise decomposition.

Figure 2 .
Figure 2. Distribution map of the study sites.

Figure 3 .
Figure 3. Taylor distribution for different sampling methods.

Figure 4 .
Figure 4. Optimal prediction results for each station in the North China Plain.

Table 1 .
Friedman test results for different α values.Significant values are in bold.

Table 2 .
Optimal models for each station based on monthly success rate.

Table 3 .
Results of optimal model evaluation indicators for each site.